TABLE OF CONTENT

1. Import Libraries

2.Assess Data Properties

3. Univariate Exploration

4. Bivariate Exploration

5. Multivariate Exploration

6. References

IMPORT NECESSARY LIBRARIES

Assess Data Properties

The number of dropped rows do not significantly affect our dataset so we can go ahead with further analysis before we start visualisation.

What is the structure of your dataset?

This cleaned data set includes details on 76224 loans for 14 different variables. The majority of the variables are numerical, while Loan Status is a nominal category variable.

What is/are the main feature(s) of interest in your dataset?

Original Loan Amount, Borrower Annual Percentage Rate (BorrowerAPR) and BorrowerRate. Predict what factors affect them.

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

I predict that Monthly paycheck size of the borrower, the original amount of the loan requested, Employment status, and the kind of Occupation will affect the features of interest.

Now let's go over and start exploring!🕺🏾🕺🏾🕺🏾

Univariate Exploration of Some Selected Variables

Question 1

What is the distribution of the main variables of interest?

Visualise

Observations

The distribution of BorrowerAPR and BorrowerRate is multimodal in nature

Question

What amount is most borrowed?

Visualisation

Observation

4k, 10k and 15k are the most borrowed amounts in Prosper loan app

From the above histogram, the stated monthly income is skewed to the right. This means that majority of Prosper client earn below 15k dollars per month

To confirm the percentage of borrowers that earn 15k

97.5% of the borrowers stated that they have a montly paycheck size of 15k and below

The debt to income ratio seems a bit skewed to the right but we can as well conclude that it is normally distributed.

Let's have a count of the employment status of the borrowers.

Question 2

What's the occupation type count of the borrowers

Visualisation

Observations

Professional

Executive

Computer

Programmer

Teacher

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

The Borrower APR is slightly multimodal and the values are between 0.05 to 0.4

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

The distribution of stated monthly income is skewed to the right and most 97% of the borrowers earn below 15k per month

Another observation shows that most of the borrowers are employed.

To achieve the goal of my analysis on the sub-dataset, I dropped all null values since they weren't much enough as to negatively bias the final result.

Bivariate Exploration

Question 1

Will higher loan amount attract lower BorrowerAPR? I predict it should, but don't bank on my assumption, let the data tell us graphically.

Visualisation

Observation

We observed a negative correlation between Loan Original Amount and Borrower APR, that means as I earlier predicted higher loan amounts had lower borrower annual percentage return

Observations

  1. 50% of the current loan is below 10k dollars and the highest is within 35k dollars range
  2. 75% of the loans that are defaulted fall within 5k - 10k dollars
  3. 75% of the loans that are past due (16-30 days) are below 15k
  4. The highest defaulted loan amount is 15k

Question 2

How does employment status perform across different Prosper Rating

Visualisation

Observation

1. Employment Status does not have enough data for Part-time, Retired, Self-employed and Not employed to show its interaction with ProsperRating (Alpha)

2. Most of the employed borrowers were C-rated followed by B and A respectively. Less than 5000 borrowers had the highest rating of AA.

Question 3

Does employment status influence the amount of loan requested

Visualisation

Observation

Borrows that have Employed, Self-employed and Others employment status borrow higher amount than part-time, retired,full time and not employed borrowers.

Question 4

Do higher income earners borrow more money?

Visualisation

Observation

There is no clear positive correlation between stated monthly income and loan original amount requested

Question 5

Does any form of correlation exist between LoanOriginalAmount and BorrowerRate?

Visualisation

Observation

There is a negative correlation between the LoanOriginAmount and BorrowerRate.

Obviously, as I had expected that interest rate should be lesser for higher loan amount, the trendline of the scatter plot shows that the negative correlation.

Question 6

How does the BorrowerAPR compare to the loan Term?

Visualisation

Observation

36 months term loans have higher BorrowerAPR than 12 or 60 months term.

Question 7

Does loan term influence the BorrowerRate? I am assuming that shorter loans should attract higher interest rate.

Visualisation

Observation

75% of 12 months term loans have interest rate below 20%.

This was a bit surprising for me. I had envisaged that shorter term loans should come with higher interest but obviously the data said something else. Longer term loans attracted higher interest.

I assume the management of Prosper Loan App sdopted this strategy to encourage faster loan repayment.

Multivariate Exploration

My key interest here is to investigate how the relationship between LoanOriginalAmount and BorrrowerAPR is impacted by categorical variables like Term and Prosper Rating (Alpha).

As a bonus, I will also explore same impact on Loan Original Amount and BorrowerRate

Question 1

What is the impact of term on Loan Amount and Borrower APR using regplot. (*Bonus: Replace Borrower APR with BorrowerRate and observe if the trend is s

Visualisation

Observation

Generally, there is a negative correlation between LoanOriginalAmount and BorrowerRate for all 3 terms. Similar tren can be observed between LoanOriginalAmount and BorrowerAPR.

Question 2

How does Prosper Rating affect the relation between Borrower APR (Annual Percentage Rating) and Loan Original Amount

Visualisation

Observation

The borrower APR and Loan Original Amount have a positive link. However, the relationship becomes negative as the rating drops from AA to HR. I believe that Prosper executives purposefully increased the APR for high-rated customers as the loan amount requested increased in order to maximise returns from the transaction (possibly because these customers have been with them for a long time and they are already loyal to the brand). In contrast, those with lower prosper ratings have lower APRs as the loan amount increases. I think this is being done on purpose to entice new clients—who most likely have low APRs—to try out the service.

Question 3

How does Prosper Rating affect the relation between Borrower Rate (Interest Rate) and Loan Original Amount

Visualisation

Observation

Similar conclusion can be drawn for this relationship between Loan Original Amount and Borrower Rate as in that of Loan Original Amount vs BorrowerAPR above.

Question 4

Using Seaborn pointplot, can we see how loan term affect the relationship between ProspersRating and BorrowerAPR

Visualisation

Observation

Highly rated borrowers (AA-B) have lower APR, though there is an incremental difference as the loan term increases from 12-60. But poorly rated borrowers attract higher APR.

Question 5

Using Seaborn pointplot, can we see how loan term affect the relationship between Employment Status and BorrowerAPR?

Visualisation

Observation

Highly rated borrowers (AA-B) have lower APR, though there is an incremental difference as the loan term increases from 12-60. But poorly rated borrowers attract higher APR.

Question 6

Visualise the impact of Term on the relationship between ProsperRating (Alpha) & LoanOriginalAmount; and ProsperRating (Alpha) & StatedMonthlyIncome

Visualisation

Observation

Borrowers with high monthly income and prosper rating tend to borrow loans of 12 months term duration.

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The borrower APR and Loan Original Amount have a positive link. However, the relationship becomes negative as the rating drops from AA to HR.

Further exploration on the influence of loan term and prosper rating on the original loan amount shows that for better rating, the amount increases for all three terms.

Were there any interesting or surprising interactions between features?

Unexpectedly, the borrower APR and loan amount have a negative link when the borrower's Prosper rating is between HR and B, but a positive correlation when the borrower's rating is between A and AA. Another intriguing finding is that for borrowers with HR-C rates, the borrower APR decreases as the borrow time lengthens. However, the APR rises with the length of the loan for those with B-AA credit ratings.

References

1. Pmalo46

2. Amyra Fathy

3. Imsingla

4. Stackoverflow

5. Types Categorical

6. Point Plot